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ABSTRACT 


Multiversion programming Is a redundancy approach to developing highly 
reliable software. In applications of this method, two or more versions of a 
program are developed independently by different programmers and the versions 
then combined to form a redundant system. One variation of this approach 
consists of developing a set of n program versions and testing the versions to 
predict the failure probability of a particular program or a system formed 
from a subset of the programs. In this paper we examine the precision that 
might be obtained, and also the effect or programmer variability if 
predictions are made over repetitions of the process of generating different 
program versions. 

Key Words: N-version programming, multiversion software, binomial mixture 

sampling model, failure intensity, intensity distribution, estimation. 
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1. INTRODUCTION 

N-version or multiversion programming, originally proposed by Avizienis 
[1], is a redundancy method of structuring software components to cope with 
residual software design faults. Ideally, the use of multiple versions will 
greatly decrease the probability that a majority of the versions fail on the 
same input, thus providing a system having greater reliability than a single 
version. The N-version method Involves independently generating N > 2 
versions of a program and running them concurrently, all versions receiving 
the same inputs and producing their own outputs. The outputs of the programs 
are compared by a voter to determine, for each input, a majority decision 
output. 

To model the effect of using multiple versions, several assumptions are 
required concerning the process by which programs are created and are run in 
an operational setting. The model we proposed in an earlier paper (Eckhardt 
and Lee, [2]) assumes that if program versions are generated by physically 
separated programmers or programming teams and according to a common set of 
requirements, they are, in some sense, independent and identically distributed 
objects. Littlewood and Miller [3] argue in favor of dropping the identically 
distributed assumption to model the effect of diverse methodologies (i.e., the 
use of different languages, development environments, etc.). In the absence 
of any systematic differences between versions (i.e., no forced diversity), 
the present paper is concerned with analyzing the predictive information 
obtained when testing multiple versions of a program. 
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2. DESCRIPTION 


The purpose here is to describe modeling considerations for failure 
probability estimation. Predictive information may be measured as the 
precision with which one can estimate the failure probability of a given 
program or a given system of programs. It may also be measured as the 
precision obtained when estimating probabilities across a population of 
programs. Since it is unlikely that small probabilities can be estimated for 
a population of programmers with adequate precision, our concern is the 
precision obtained when estimating the failure probability of a particular 
program or system. 

Uncertainity associated with the process of testing refers to whether a 
program fails on an input condition and whether the input condition ever 
occurs during testing. Failure is an event realized if a program produces 
incorrect output. However, a failure is recorded during testing only if there 
is a mechanism by which an error is detected; with multiple versions the 
testing process can be automated since different versions provide a basis for 
error detection. -An important assumption is that the occurrence of Identical 
incorrect ouput among the available programs has a very small probability, in 
comparison to other types of errors, so that errors are detected if they occur 
and the true failure probabilities are as low as indicated by the data 
obtained by testing. 

Uncertainty concerning the failure of a program is conceptually 
inseparable from uncertainty about the effect of the programmer. Modeling the 
occurrence of failures is most easily motivated in terms of a sampling model 
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in which one imagines that programmers are picked at random. Such a model may 
not be the correct model, and this may have some effect when estimating small 
probabilities. Our purpose, rather than explore modeling possibilities, is to 
give a basis for the proposition that small probabilities may be estimated 
with adequate precision. 

Since a model is a necessity in this context, we relate a sampling model 
under which failure data may be obtained to a theoretical model for the 
failure probability of a system having N component versions. When a larger 

set of n versions is run on a random input series Xj, X 2 X«, 

summary Information is provided by counts Yj, Y 2 , ...» Y< of the numbers 
of versions which fail simultaneously (i.e., on the same input condition). 

This information may be summarized and conveyed through a cumulative 
distribution function (cdf). This cdf indicates the tendency for program 
versions to fail together as percentages of Inputs on which various 
percentages of the versions fall. 

Section 4 describes a sampling model and a theoretical model for a system 
of N versions. In section 5 we consider a statistic and describe a 
framework within which Pr has the optimal property of minimum variance 
within a restricted class of statistics. In section 6 we obtain the variance 
of Pfl. There and in section 7 we consider an interpretation of variability 
with the emphasis in section 7 being on the precision of estimates as obtained 
from the Knight and Leveson [4] failure data. 

3. NOTATION 

0 Input space for programs designed to a common set of 

requirements 

Probability a program fails on input x 


0(x) 
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V (x) The binary random variable defined by V(x) = 1 if a 

program fails on input x and V(x) = 0, otherwise 

Q Q(A) is the probability an input occurs in the set A 

X-j A randomly selected input condition 

G(z) A cdf giving the theoretical probability an input occurs 

in some subset of the input space for which 0(x) < z 

k,n,N The number of input test cases, program versions, and 

component versions of a system, respectively 

Yi, Y 2 , .... Y|< Counts of the numbers of versions that fail 

simultaneously on successive random inputs 

G n (z) The cdf of n" 1 Yi 

P(Yi * j), j = 0,1,2,... ,n 

Pn Failure probability of a system having N component 

versions 

p Failure probability of a single version 

Sj n Input frequency of j failures among n versions 

s n ( s ln> s 2n>*** »Snn) 

'v, a. 

P^ , p An unbiased statistic for estimating P^.p, 

respectively 

k _1 / 2 t The standard deviation of Pr 

4. THE SAMPLING MODEL AND RELATED PARAMETERS. 

We first describe the assumed sampling model. Let fl be the common input 
space of the software versions and let 0(x) be the probability a version fails 
on inputs x in 0; 0(x) is called the failure intensity. Define a collection 
of binary random variables V(x), xefl, by V (x) = 1 if a version fails on input 
x and V(x) = 0, otherwise. For a set of n versions, similarly define Vi(x), 
V 2 (x),..., V n (x), xefi. The assumptions concerning the process of 
developing the programs and the input process are the following: 
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A1 {Vi(x), xefl}, {V2(x), xefl}, ..., {V n (x), xefl} are independent 

collections of random variables and for each xefl, Vi(x), V2(x),..., 

V n (x) are identically distributed. 

A2 An input series Xi, X2,..., is stationary and independent; the 
probabilities Q(A) = P(XieA) are given by a usage distribution Q. 

n 

A3 Failure counts Y. = £ V . (X . ) , i = l,2,...,k on successive random inputs 

1 j-1 J 1 

are independent random variables. 

AT and A2 are the assumptions of the model we described in [2]. 

Although failures of the programs can be observed individually, it 
suffices for much of our discussion to consider the implications of A1-A3 for 
the series of failure counts Yj, Y2, .... Y|<. From A1 and A2 each Yi 
is conditionally, given Xi = x, binomial with parameter (n,0(x)). 
Unconditionally, Y 1# Y2, ..., Y|< are identically distributed and the 
distribution function of n- 1 Yj is 

G„(z) = / l (5) (1 - u) n_j dG(u) (1) 

j<nz J 

where 


{x:0(x)<z} dQ 


G(z) = I 


( 2 ) 
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For fixed z, G(z) is the probability that random inputs occur in subsets of 
the input space for which 0(x) < z. (The notation Gn(z) in (1) is rather 
weakly justified by the fact that G n converges to G as n increases (Renyi , 

[5], p. 318)). By A3, Yj,, Y 2 , ...» Y|< are then independent and 
identically distributed with the distribution function in (1), and we refer to 
this distribution as a binomial mixture sampling model. 

Independence in A3 is a strong modeling assumption which can be checked 
if information is available concerning the failure counts Yj, Y 2 , ...» 

Yfc. Published failure data (Knight and Leveson, [4]) gives summary 
information only in the form of grouped frequency counts so we proceed as if 
A3 is a reasonable assumption. 

To motivate the statistic considered in a later section, we now describe 
a theoretical model for a system of N versions. A system having N component 
versions (N = 1,3,5,...) fails if an input happens to fall in the subset of 
the input space where a majority m = (N + l)/2 of the component versions 
produce incorrect output. The probability of system failure is 

N 

p n - / l (j) [0(x)] J [i - e(x)] N - J dQ (3) 

j=m J 

In the case N * 1, (3) is the failure probability p of a single version. 
Integrating (3) by substitution gives the reparmeterization 

N 

P„ - I I (j) z J (1 - z) N ' j dG(z) (4) 

j=m J 


where the integrand does not depend on any unknown parameters. 
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Dependent failures of the component versions are modeled if 8(x) varies 
with different inputs x. If 0(x) is constant, i.e., 0 (x) = p except on a set 
A with Q(A) = 0, then G is a degenerate distribution and (4) reduces to a 
model of independent failures. However, estimates of Pn obtained on the 
basis of the independence model differ substantially from estimates obtained 
without this restriction so (4) with a general form of G provides a more 
robust model. 


5. A STATISTIC FOR ESTIMATING P N . 


For analyzing failure data obtained by testing a set of n versions, we 
consider the following statistic: 


P N = k 1 0 1 £ E s j, 

j=o z,=m 


( 5 ) 


where (^) = 0 if b > a and Sj n * £ U Y -j = j) is the input frequency of j 


'V 

failures among n versions (1(E) is the indicator function of the set E). Pn 
was derived (Eckhardt and Lee, [6]) by considering an average of the estimated 
failure probabilities of N version systems formed by selecting subsets of size 
N out of the total of n available versions. 


The summary frequencies Sj n » J (I(Y^ = j), j =0, 1, ..., n are 

obtained by grouping Yj, Y 2 , .... Y|< on the integers j =0, 1, ..., n. 
The distribution function of G n (z) of n -1 Yi has mass 

^ = / (J) u j (1 - u) n_j dG(u) 


( 6 ) 
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at j/n, j = 0, 1, n. Note that (6) defines a mapping n * (hi, H2» 

.... ir n ), or tr * ir(G), from a family of distributions on the unit interval. 
If G is a member of the class of continuous distributions on [0,1], then the 
range of ir(G) is limited only by requiring that the probabilities Tfj sum to 
one. 


Let J = (1(1), 1(2), ..., 1 (N) } be a subset of the indexing set 
{l,2,...,n} for a set of n software versions. Define 

* k" 1 (J )' 1 l II Hi V,/.. (X,) . J) (7) 

" 1 0 ui ,(I ' 

which, by changing the order of summing, simplifies to (5). As a consequence 
of (7), Pn is unbiased for and is a U-statistic (Serfling, [7], p. 172) 
which has the desirable property of minimum variance among unbiased statistics 
that depend on S n = (Si n , S2 n , .... S nn ). (A U-statistic is a 
statistic obtained from a function of the observable random variables having 
an expected value equal to the parameter to be estimated; averaging such a 
function over all subsets gives a U-statistic as in (7)). 

Other unbiased statistics exist, however, which are not a function S n . 

One example is the statistic defined by grouping the software versions into N 
sets and averaging over all selections, one version from each set. 

The minimum variance property of Pn Is a consequence of S n being a 
complete sufficient statistic for ir in relation to Yj, Y 2 , ..., Y|<; 

S n , however, is only a summary statistic in relation to the larger set 
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{Vj(X-j), j = 1.2,. ..,n, 1 * l,2,...,k}. In staying with our purpose, the 
remainder of the discussion is limited to Pn. 

6. THE VARIANCE OF P N . 


To express Pn in a more convenient form, write 


P N * k ' 1 I l « n1 I(Y, - J) 
N i j*o nj 1 


where 


a nj * V 1 -E (®.) (n-^* 


i=m 


/ n- j' 


( 8 ) 

( 9 ) 


Except for the constant k” 1 , (8) is the sum of the quantities 
W ni a nj - J). 1 - 1.2,... .k. 

J ”” o 


Since W n i is a function only of Y-j, it follows from A3 that W n -j , i = 
l,2,...,k are Independent and identically distributed random variables. 

'X, 

Therefore, for fixed n and k tending to infinity, Pn has an asymptotic 
normal distribution with mean Pn and standard deviation k -1 / 2 ! where 


« n * 

* - x t* n) y 

j=o J 


n 

- ( l 

J-o 


a nj V 


( 10 ) 


k -1 / 2 x measures the precision obtained when testing a given set of programs 
but it does not give a true reflection of variability over a population of 


programmers. 
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To clarify the interpretation of t, consider the special case N = 1 of 
(8). If N = 1, then (8) reduces to 

P - k' 1 I I (j/n) I(Y, ■ J) (11) 

i j =0 1 

which estimates the average failure probability p of a single version. 

Suppose G is degenerate at the constant value of 0(x) = p. In this case the 
parameters defined In (6) are binomial probabilities and (10) becomes x 2 = 
p(l-p)/n. This being the variance of a binomial random variable scaled by 
n _1 , the quantity t seems appropriately described as a measure of the effect 
of variability over repetitions of the process of generating different 
programs. 

7. ESTIMATES OF PRECISION. 

The summary data in Table 1 was obtained (Knight and Leveson [4]) by 
testing n=27 programs on k=10 6 randomly selected input conditions. There 
were 2 input cases on which 8 of the versions failed together, 12 cases where 
7 versions failed together, and so on. 

Estimates of P^, N = 1,3 and of the standard deviation, K -1 ^ 2 t and 
t, are given in Table 2. On average, for the given set of programs, a system 
of 3 versions has a much smaller (by a factor of 19) failure probability than 
a single version. When considering only the uncertainty associated with 
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testing the 27 versions, the standard deviation of these estimates, as given 
in the second column of Table 2, indicates that these probabilities may be 
estimated with high precision. The quantities in the third column suggest 
considerable variation in the estimates If the experiment were repeated for a 
different set of programmers. 

8. CONCLUSIONS. 

The primary motivation for this paper is to give a basis for the 
contention that, with multiple program versions, small failure probabilities 
might be estimated with a reasonable degree of precision. The calculations of 
the previous section suggest high precision. However, we emphasize that the 
precision obtained refers to predicting an average failure probability when 
testing a given set of programs (i.e., the precision does not apply to making 
predictions across a population of programmers) and that our modeling 
assumptions may have considerable effect on estimates of precision. Because 
of our indifference as to the choice of one program or set of programs, a 
statistic was used which estimates an average failure probability over the 
given set of programs. 
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Table 1. Failure proportions for 27 

Number of failed 
versions 

0 

1 

2 

3 

4 

5 

6 

7 

8 

Source: Knight and Leveson [4] 


programs on 10® random input cases, 
estimates of ttj 

0.983607 

0.015138 

0.000551 

0.000343 

0.000242 

0.000073 

0.000032 

0.000012 

0.000002 


Table 2. Estimates of P^, anc | x# 

N k ~1/2 X x 

1 0.00069978 0.00000616 0.00616 
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0.00003669 0.00000144 


0.00144 


